A Comparison of PCFG Models

نویسندگان

  • Jose L. Verdú-Mas
  • Jorge Calera-Rubio
  • Rafael C. Carrasco
چکیده

In this paper, we compare three different approaches to build a probabilistic context-free grammar for natural language parsing from a tree bank corpus: 1) a model that simply extracts the rules contained in the corpus and counts the number of occurrences of each rule 2) a model that also stores information about the parent node's category and, 3) a model that estimates the probabilities according to a generalized k-gram scheme with k-3. The last one allows for a faster parsing and decreases the perplexity of test samples.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Final Project: a Comparison between Different Parsing Models

Section 2 discusses the formalizations of the different models, showing the ways to transform the certain treebank into preferred forms which the parser can use directly. We will mainly focus on the DOP model, since PCFG model in our view can be seen as the elemental case of DOP model and other models we will consider also follow the similar extraction method of PCFG with only the tree forms va...

متن کامل

Is it Really that Difficult to Parse German?

This paper presents a comparative study of probabilistic treebank parsing of German, using the Negra and TüBa-D/Z treebanks. Experiments with the Stanford parser, which uses a factored PCFG and dependency model, show that, contrary to previous claims for other parsers, lexicalization of PCFG models boosts parsing performance for both treebanks. The experiments also show that there is a big diff...

متن کامل

Appropriately Handled Prosodic Breaks Help PCFG Parsing

This paper investigates using prosodic information in the form of ToBI break indexes for parsing spontaneous speech. We revisit two previously studied approaches, one that hurt parsing performance and one that achieved minor improvements, and propose a new method that aims to better integrate prosodic breaks into parsing. Although these approaches can improve the performance of basic probabilis...

متن کامل

What is the PCFG? A review of available information

The majority of Eastern North Pacific gray whales migrate north in the spring to feeding grounds in the Bering, Chukchi, and Beaufort seas. However, each year some smaller portion of the population spends part (or all) of the feeding season farther south, between California and the Alaskan Peninsula. A number of whales which have been catalogued in these areas have shown inter-annual fidelity t...

متن کامل

Large-Scale Corpus-Driven PCFG Approximation of an HPSG

We present a novel corpus-driven approach towards grammar approximation for a linguistically deep Head-driven Phrase Structure Grammar. With an unlexicalized probabilistic context-free grammar obtained by Maximum Likelihood Estimate on a largescale automatically annotated corpus, we are able to achieve parsing accuracy higher than the original HPSG-based model. Different ways of enriching the a...

متن کامل

Accurate and Robust LFG-Based Generation for Chinese

We describe three PCFG-based models for Chinese sentence realisation from LexicalFunctional Grammar (LFG) f-structures. Both the lexicalised model and the history-based model improve on the accuracy of a simple wide-coverage PCFG model by adding lexical and contextual information to weaken inappropriate independence assumptions implicit in the PCFG models. In addition, we provide techniques for...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000